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ABSTRACT 

Many applications log a large amount of events continuously. 
Extracting interesting knowledge from logged events is an 
emerging active research area in data mining. In this con- 
text, we propose an approach for mining frequent events 
and association rules from logged events in XML format. 
This approach is composed of two-main phases: I) construct- 
ing a novel tree structure called Frequency XML-based Tree 
(FXT), which contains the frequency of events to be mined; 
II) querying the constructed FXT using XQuery to discover 
frequent itemsets and association rules. The FXT is con- 
structed with a single-pass over logged data. We implement 
the proposed algorithm and study various performance is- 
sues. The performance study shows that the algorithm is 
efficient, for both constructing the FXT and discovering as- 
sociation rules. 
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General Terms 

Algorithms, Performance 

Keywords 

keywords: Mining logged events, XML mining, frequent 
itemsets, association rules. 

I. INTRODUCTION 

Recently, the extensible Markup Language (XML) has 
become widely used as the de facto standard for represent- 
ing, exchanging, modeling, and maintaining semi-structured 
data. The widespread of XML-based applications and in- 
creasing amount of XML data pose several challenges for 
mining XML data. Modern XML-based applications log 
huge amounts of events at real-time, continuously. The 
logged event data describe the status of each application 



component and can be used to trace application activities. 
Applications that log events in XML format range from 
scientific to business and financial applications. Examples 
of such applications include XML-based data warehousing, 
web personalization and web-click logs, geographic infor- 
mation systems, and e-commerce. Mining and analyzing 
logged event from such applications help for achieving self- 
management systems. Therefore, mining XML-formatted 
logged events is becoming increasingly important. It should 
have high attention from the database, data warehousing, 
data mining, and machine learning research communities. 

Mining logged events is the process of extracting knowl- 
edge from continuous, rapid logged events. One of the most 
important data mining techniques is association rule mm- 
ing. Association rule mining discovers interesting associ- 
ation and/or correlation relationships among large sets of 
logged events, and predicts upcoming events based on oc- 
currence of previous ones. Mining association rules from 
incremental XML-formatted logged events is different than 
mining traditional static data, due to several specific issues 
and challenges either related to data arrival [sIIt], or XML- 
formatting nature [4 
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When logging events, they arrive continuously at mod- 
erate or high speed, in unbounded amount, and changing 
data distributions. Unlike in traditional data mining, there 
is not enough time to rescan the whole database whenever 
an update occurs. Therefore, a single-pass over events is 
required. Logged events need to be processed incrementally 
as fast as possible. Processing speed should be faster than 
events arrival rate. Moreover, mined data should not need 
to be recalculated each time requested. Unbounded amount 
of logged events and limited system resources, such as disk 
storage, memory usage, and CPU power, lead to the need for 
event mining algorithms that adapt themselves to available 
resources, otherwise accuracy result decreases. Also, while 
traditional data mining techniques mine frequent itemsets 
and discard non- frequent itemsets, this property is not valid 
for logged events, where the frequency of itemsets is chang- 
ing over time. On the other hand, extracting knowledge 
from XML data is more difficult than an operational data, 
because of the flexible, irregular, and semi-structured nature 
of XML data. 

To the best of our knowledge, there is no algorithm pro- 
posed in the literature to discover interesting knowledge 
from incremental XML-formatted logged events. Therefore, 
we propose in this paper an incremental algorithm for this 
purpose. Our algorithm is composed of two main phases: 
firstly, we construct a new tree structure called Frequency 



XML-based Tree (FXT) that stores frequencies of events 
to be mined. Secondly, we query frequent event-sets and 
association rules efficiently from the constructed FXT us- 
ing XQuery. Our algorithm handles most processing logged 
event issues. It satisfies a single-pass over data transac- 
tions to construct the compact FXT structure. Although the 
FXT is processed using XML technologies and constructed 
in XML format, its construction time is fast enough. Asso- 
ciation rules with different minimum supports are queried 
at any time without re-constructing the FXT from scratch. 

The rest of this paper is organized as follows. Related 
work is discussed in section [2] In section [3j we present our 
motivation and a description of logged events. Section [4] in- 
troduces the general structure of the novel Frequency XML- 
based Tree (FXT) and our algorithm for constructing the 
FXT. Mining frequent itemsets and association rules from 
the FXT is presented in section [51 Performance study of our 
algorithm is discussed in sectioii|6l Finally, we conclude and 
highlight future trends in section [Tl 



2. RELATED WORK 

There are two main types of approaches for XML data 
mining in the literature. The first type of approaches ap- 
plies relational data mining tools on XML data by mapping 
XML documents to relational data model and storing them 
in a relational database 
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The second type of approaches 
applies data mining techniques directly onto native-XML 
data [2j |9j [lO] . We are interested with the second type of 
approaches, specifically mining frequent itemsets and asso- 
ciation rules from XML data. 

Mining association rules using XQuery. 

Wan and Dobbie provide XQuery implementation of the 
well-known Apriori algorithm [T] , to extract association rules 
from XML documents without any pre-processing or post- 
processing [9]. Their algorithm is adapted to simple and 
well-defined XML format. This algorithm is extended with 
pre-processing step in order to mine more complex and ir- 
regular XML documents 10 . Authors actually transform 



complex documents into a format that can be mined by Wan 
algorithm using XSLT. Braga et al. propose XMINE [2], a 
tool to extract XML association rules from XML documents. 
The XMINE operator is based on XPath and XQuery to ex- 
press complex mining tasks on the content and the structure 
of XML data. 

Tree-based mining algorithms. 

Han et al. propose FP-Growth for mining frequent item- 
sets without generating candidate itemsets FP-Growth 
requires two database scans for constructing its FP-Tree. 
Cheung and Zaiane extend FP-Tree by proposing a novel 
data structure called CATS Tree [s]. As FP-Tree, CATS 
tree allows frequent pattern mining without generation of 
candidate itemsets. It allows mining with a single pass over 
the database as well as efficient insertion or deletion of trans- 
actions at any time. 

To the best of our knowledge, our algorithm is the first 
work proposed to mine frequent itemsets and association 
rules from incremental XML-formatted logged events, us- 
ing XML technologies (e.g., XPath and XQuery). Table 
[1] shows differences between FXT versus tree-based tech- 
niques (i.e., FP-Growth and CATS) and XQuery-based im- 



plementation techniques (i.e., Apriori implementation). Al- 
though Apriori-implementation mines association rules from 
XML data using XQuery [9], it is designed to static trans- 
actions of XML data. Mining association rules with dif- 
ferent minimum support by Apriori algorithm requires re- 
generating the largest itemsets from scratch. Compared to 
our algorithm, Apriori-implementation provides less perfor- 
mance particularly for large databases of transactions. De- 
spite CATS [3] is not proposed for mining XML data, it 
is based on constructing an incremental frequency tree like 
our algorithm. Rather than CATS algorithm mines frequent 
patterns with a complicated algorithm named FELINE, it 
does not support mining association rules from CATS tree 
directly due to the absence of total size of transactions. 
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Table 1: FXT versus related works 



3. LOGGING EVENTS 

There are several software platforms that log a large amount 
of events incrementally every day, into simple text or XML 
format. Logged events are essential to understand and trace 
the activities of such platforms. For instance, we are moti- 
vated to mine logged events from XML-based data integra- 
tion platforms . It worth to be noted that these platforms 
are developed, managed and maintained using XML tech- 
nologies. Data integration is the process of extracting data 
from heterogeneous and distributed sources, transforming 
them into a unified format, and loading them into a repos- 
itory (namely a warehouse) , see Figure [T] Discovering in- 
teresting knowledge from logged events can be employed to 
self-maintain and configure the workflow behavior of these 
systems, how to achieve this issue is out scope of this paper. 

Actually, logged events include much descriptive informa- 
tion about each activity (e.g., identiflcation, occurring time, 
source, description, category, etc.). The more logging infor- 
mation, the more interesting knowledge can be discovered. 
In order to apply our algorithm for mining frequent events 
and association rules, we need to pre-process logged events 
and organize them into transactions as illustrated in the fol- 
lowing sample. 



< « 



Sources Integration System 




Monitoring 




Metadata 




Active Rule 
Engine 



1- 



I Event 
I Log 



Mining 
Events 



> 



Rule 
Base 



item appeared in any logged transaction. Thirdly, the Depth 
nodes refers to all root's grandchildren nodes. It represents 
a relative or conditional count of a specific item given other 
related items. The depth nodes are represented as set of 
paths, each path corresponds specific transactions itemsets. 
In figure|2] the dashed line annotated by double slashes "//" 
means that there is zero or more in-between nodes in a spe- 
cific depth path. 
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Figure 1: Data integration system 
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<t reins act i' 

<itein>A</ 
</transact 
<trcinsacti' 

<itein>C</ 
</transact 
<trcinsacti' 

<itein>B</ 
</transact 
<transacti' 

<itein>C</ 
</transact 
<transacti' 

<itein>B</ 
</transact 
<transacti' 

<item>A</ 
</traiisact 



on id= 
iteiii> 
ion> 
on id= 
item> 
ion> 
on id= 
item> 
ion> 
on id= 
item> 
ion> 
on id= 
item> 
ion> 
on id= 
item> 
ion> 



"1" time="2011-04-10 09:16:00"> 
<item>B</item> <item>C</item> <iteni>D</item> 



"2" time="2011-04-10 09:16:20"> 
<item>E</item> 



"3" time="2011-04-10 09:16:40"> 
<item>C</item> 



"4" time="2011-04-10 09:17:00"> 
<iteiii>D</item> <item>E</item> 



"5" time="2011-04-10 09:17:20"> 
<item>C</item> <item>D</item> 



"6" time="2011-04-10 09:17:40"> 
<iteni>C</item> <item>E</item> 



Each transaction has identification, its occurring time, 
and a set of items which represent platform events. The 
set of events is assumed to be logged in a window-size of 
time (time + window). The corresponding format of logged 
transactions can be obtained directly from their origin plat- 
forms, or can be transformed using the XSLT language to 
a format that can be processed by our algorithm in a pre- 
processing step. The most important thing to our algorithm 
is to define the listing of items of each transaction, which 
should be sorted alphabetically for performance purposes. 



4. FREQUENCY XML-BASED TREE (FXT) □ 



4.1 FXT Structure 

In order to mine frequent itemsets or association rules, 
the frequency of events (or items) needs to be calculated. 
Hence, we propose a novel tree structure that contains fre- 
quency of all logged items, named Frequency XML-based 
Tree (FXT). The FXT nodes, except root node, consist of 
two entries: item name and counter, where item name reg- 
isters which item this node represents (e.g., li), and counter 
registers the number of transactions represented by the por- 
tion of the path reaching this node (e.g., Ni or Nm\...\i)- 
As illustrated in figure [2] the FXT is composed of three 
main levels of nodes. Firstly, the Root node refers to the 
FXT root node. It represents the total number of logged 
transactions (Ntrana)- Secondly, the Breadth nodes refers 
to all root's children nodes. It represents the count of each 



Figure 2: FXT structure 

It worth to be noted that the FXT can handle both sorted 
and unsorted items of upcoming transactions, but we ob- 
served that handling sorted items results in more compact 
FXT structure and eases mining frequent itemsets and asso- 
ciation rules from the FXT. Thus, letters (o, i, m, and z) of 
items refer to their ordering. In addition, although FXT is 
designed to manage XML-formatted data, the same concept 
can be applied to raw data. Finally, there are some facts 
can be deduced from the FXT structure: 

• N trans = Total{trans) refers to the total number of 
transactions; 

• Ntrans > Nk, whcre Nk can be count of any item k; 

• Nk > A'^u|...|fc, where A'^„|.,.|fc is a conditional count of 
Iv given Ik and in-between items. 

4.2 FXT Management 

The first phase of our algorithm is to construct the FXT, 
by handling each logged transaction individually. 

4.2.1 Insertion of transactions 

Logged transactions are inserted into the FXT upon ar- 
rival. Our algorithm follows four steps for each logged trans- 
action on constructing the FXT as presented by algorithm 



Step 1 ( incrementing root counter, Ntrans). 

This root counter represents the total number of logged 
transactions, which can be used to calculate item support. 

Step 2 ( incrementing breadth). 

For each item of the transaction, our algorithm increments 
the item counter if it exists as one of root children (breadth 
nodes), otherwise the algorithm creates the item as new 
root child and initializes its counter at 1. Any item sup- 
port can be easily calculated later via dividing item counter 
by Ntrans, sce algorithm [2] 

Step 3 ( incrementing depth). 

The algorithm increments the transaction path if it ex- 
ists, otherwise creates it. While creating the path, item by 



Algorithm 1: FXT construction 



Algorithm 3: Incrementing depth 



Input: Set of transactions (S) 
Output: FXT document 
begin 

foreach T e S do 

idx = 

(: stepl: increment root counter :) 
root/@counterH — h 
increment-or-create-breadttL(T) 
increment-or-create-deptli{T) 
update-otiier-patiis(T, idx ) 
end 
end 



Algorithm 2: Incrementing breadth 

Procedure: increment-or-create-breadtli(T) 

Input: Transaction(T) 

begin 

foreach item £ T do 

if item £ root/* then 
I item/OcountcrH — h 
else 

(: create new item as root child, initialize its 
counter at 1 :) 

insert root /item (: item as root child :) 

item/@counter=l 
end 
end 
end 



Procedure: increment-or-create-depth(T) 

Input: Transaction(T) 

begin 

path = root 

preltemPaths = 

for idx = to length(T)-l do 

path path / item[idx] (:where item is T 

member:) 

nexidx <— idx+1 

nextltem <— item [nexidx] (:next item in T:) 

(: paths that have the item anywhere: ) 
preltemPaths <— preltemPaths / / item[idx] 
(:if inserted item is child of its previous item path:) 
if nextltem £ path/* then 
nextltem /OcounterH — h 

(: if next item is descendant of its previous 
items :) 

else if nextltem £ root/ /preltemPaths//* then 
insert path/nextltem 

(: max preltem counter is incremented as 
initialization counter :) 
nextItem/@counter <— 

Max(root/ /preltemPaths// nextItem/@counter)+l 
else 

I insert path/nextltem 
I nextItem/@counter=l 
end 
end 
end 



item, respecting the transaction items ordering, the algo- 
rithm takes into account the previous occurrence of relative 
transaction items. This is reflected when initiating counter 
of the path items. The FXT path may not correspond only 
the same transaction that occurred once or several times, 
but also correspond many transactions that satisfy the same 
beginning portion of the path. This step is presented by al- 
gorithm [3] 

Step 4 ( updating other paths). 

This step is required to ensure the correctness of counting 
of one given itemset across different FXT paths. For each 
transaction, some paths can be generated from transaction 
items that difTer from the path built in step 3, called other 
paths. The algorithm checks only other paths existing in the 
FXT to be updated. In case if they do not already exist, the 
algorithm does not create them for compactness purpose, see 
algorithm |4] 

4.2.2 Example 

Figure [3|a-f ) shows the four steps to construct the FXT 
by inserting transactions given in section |3] Because steps 
1 and 2 are always applied directly for all transactions, we 
focus on how steps 3 and 4 are applied. 

In figure |3[ a) and figure ISlb), step 3 creates the paths 
"root/A/B/C/D" and "root/C/E", respectively. Step 4 is 
not evaluated, because there are no other paths available. 
In figure [3]^c), in order to initialize counter of item "C" ac- 
cording to step 3, the algorithm detects item "C" as child 
of item "B" in the path "root/A/B/C/D". Thus, counter of 
item "C" in the existing path is incremented to become an 
initialization value of item "C" in the new path "root/B/C". 
In figure |3|d), step 3 initializes counter of item "D" at 2, be- 



Algorithm 4: Updating other paths 

Procedure: update-other-paths(T, idx ) 

Input: Transaction(T), index of breadth item (idx) 

begin 

path = root 

for idx to length(T)-l do 

path -f— path / itcm[idx] (: where item is T 
member :) 

(: skip lst-then-2nd item sequence to generate other 
possible paths :) 
if idx = then 

I nexidx = 2 
else 

I nexidx = idx-(-l 
end 

for nexidx to length(T) do 
leafltem <— item[nexldx] 

(:if leaf item is descendant of its previous items:) 
if leafltem £ path/* then 
I leafItem/C!!counter-|--f 
end 
end 

(: repeat starting with next item of T as parent 
of the path :) 

update-other-paths(T, nexidx) 
end 
end 



cause item "D" already exists as child of item "C" in the path 
"root/A/B/C/D". But, step 4 detects other path "root/C/E" 
in the FXT, thus the counter of item "E" is incremented. In 
figurejsj^e), step 3 initializes the counter of item "D" at 2, be- 
cause item "D" already exists as grandchild or child of items 
"B" and "C", respectively in the path "root/A/B/C/D". More- 
over, step 4 detects portion of other path "root/C/D" in 
the FXT, thus counter of item "D" is incremented. In fig- 



ure[3|[f), step 3 initializes the counter of item "C" at 2, be- 
cause item "C" already exists as grandchild of item "A" in 
the path "root/A/B/C/D". Also, step 4 detects other path 
"root/C/E" in the FXT, thus counter of item "E" is incre- 
mented. Finally, the constructed FXT is as follows. 



<? xml version=" 1 . 0" > 
<root counter = "6"> 
<A counter = "2"> 
<B counter = "1"> 
<C counter = "1"> 

<D counter = "l"/> 
</C> 
</B> 

<C counter = "2"> 
<E counter = "l"/> 
</C> 
</A> 

<B counter = "3"> 
<C counter = "3"> 

<D counter = "2"/> 
</C> 

</B> 

<C counter = "6"> 
<E counter = "3"/> 
<D counter = "3"> 

<E counter = "l"/> 
</D> 

</€> 

<D counter = "3"/> 
<E counter = "3"/> 
</root> 



5. MINING FREQUENT ITEMSETS AND AS- 
SOCIATION RULES USING XQUERY 

The main objective of constructing the FXT is to mine fre- 
quent itemsets and association rules easily using the XQuery 
language. Frequent itemsets are queried by traversing the 
FXT from breadth nodes to specific nodes (portion of paths), 
or to leaf nodes (complete paths). Frequent itemsets are 
filtered using a statistical measure called support. Support 
measures the proportion of transactions that contains a spe- 
cific item (or itemset). A frequent itemset is an itemset 
whose support is greater than some user-specified minimum 
support. Frequent itemsets satisfy the Apriori property, 
which states that if a given portion of path does not satisfy 
minimum support, then neither will any of its descendants 
[I] . Examples for retrieving the support of items and itemset 
from the FXT follow. 

SuppOTt(A) = rcot/A/Qco^nter 

^ ^ root / counter 

SuppOTt(B,C,D) = -""t/B/C/D/mcouuter 
\ ' J y root /counter 

Example: The following example introduces the function 
for generating frequent itemsets from the FXT that con- 
structed in main example (section |4.2.2[ ). 



declare variable $input := doc ("tree . xml") /root ; 
declare variable $rootCounter := Sinput/Qcounter ; 

declare function local : getPrequentltemsets C$parent as xs: string, 
Selement as element (*, xs: untyped), $minSupport as xs: decimal) { 

let $path : = concat (Sparent , ' / ' ,name C$element) ) 

where $element/Ocounter div $rootCounter>=$minSupport 
return 

(<f requent path="{$path}" count="{$element/(3counter}" 
support="{$element/Ocounter div $rootCounter}"/> , 

for $child in $element/* 




(a) After inserting Tl (A B C D) 




(b) After inserting T2 (C E) 




(c) After inserting T3 (B C) 




(d) After inserting T4 (C D E) 




(e) After inserting T5 (B C D) 




(f) After inserting T6 (A C E) 
Figure 3: Constructing the FXT document. 



return 

local : get Frequent It emsets C$path, $child, $minSupport) ) }; 



C: call the function :) 
for Schild in $input/* 
return 

local : getFrequentltemsets (" " 



$child, 0.25) 



The result of calling previous function to get frequent 
itemsets with min support=0.25 is as follows. 



<f requent 
<f requent 
<f requent 
<f requent 
<f requent 
<f requent 
<f requent 
<f requent 
<f requent 
<f requent 
<f requent 
<f requent 



path= 
path= 
path= 
path= 
path= 
path= 
path= 
path= 
path= 
path= 
path= 
path= 



"/A" count="3" 
"/A/B" count=" 
"/A/B/D" count 
"/A/C" count=" 
"/B" count="4" 
"/B/C" count=" 
"/B/C/D" count 
"/C" count="6" 
"/C/E" count=" 
"/C/D" count=" 
"/D" count="4" 
"/E" count="4" 



support="0.375'7> 
2" support="0.25'7> 

2" support="0.25"/> 
2" support="0.25'7> 

support= " . 5 " /> 
3" support="0.375'7> 
"2" support="0.25'7> 
support="0 . 75 '7> 
3" support="0.375'7> 
3" support="0.375'7> 
support= " . 5 '7> 
support= " . 5 '7> 



Association rules have been first introduced in the context 
of retail transaction databases jl]. An association rule is an 
implication of the form Y, where the rule body X and 



head Y are subsets of the set I of items (I =ii,22, 



within a set of transactions D and X (1 Y — ^. A rule 
X y states that the transactions T that contain the items 
in X are hkely to also contain the items in Y. Association 
rules are characterized by two measures: the support, which 
measures the proportion of transactions in D that contain 
both items X and Y; and the confidence, which measures 
the proportion of transactions in D containing items X that 
also contain items Y. Confidence{X Y) can be expressed 
as the conditional probability p(Y\X). Thus, we define: 

Support{X Y) 

Confidence{X ^ Y) — 



count(X[JY) root/X/Y/^counter q\ 

^trans root / counter V ' 

support(X 
support(X) 

count{XUY) root/X/Y/@courLter 

count(X) root / X/@counter 



(2) 



Example: The following example introduces the XQuery 
function for generating a set of association rules from the 
FXT constructed in main example (section 4.2.21. 



declare function local ; generateRules CSparent as xs:string, 
$x as element (*, xs;untyped), $niin_sup as xs:decimal, 
$niin_conf as xs: decimal) { 

let $path_x : = concat C$parent , V ' ,name($x) ) 
return 
(for $y in $x/* 
let $y_given_x := name($y) 

let $support_xy := $y/!9counter div $rootCounter 
let $support_x := $x/(3counter div $rootCounter 
let $confidence := $support_xy div $support_x 
where $support_xy >= $min_sup and $confidence >= $min_conf 
return 

C<rule body="{$path_x}" head="-C$y_given_x}" 

support="{$support_xy} " conf idence=" {$conf idence}'7> , 
local :generateRules C$path_x , $y , $min_sup , $min_conf ) ) ) } ; 



(: call the function :) 
for Schild in $input/* 
return 
local : generateRules C " " , 



Schild, 0.25, 0.5) 



The result of calling previous function to get association 
rules with min support=0.25 and min confidence=0.5 is as 
follows. 



<rule body='7A" head="B" support="0 . 25" conf idence="0 . 666 "/> 
<rule body='7A/B" head="D" support="0 . 25" conf idence=" l"/> 
<rule body='7A" head="C 
<rule body='7B" head="C 
<rule body='7B/C" head^ 
<rule body='7C" head="E 
<rule body='7C" head="D 



support="0 . 25" conf idence="0 . 666" /> 
support="0 
"D" support=' 
support="0 
support="0 . 375" conf idence="0 . 5'7> 



375 " conf idence= " . 75 " /> 
0.25" confidence="0.666"/> 
375" conf idence="0 . 5"/> 



Note that, it is possible to apply other XQuery functions 
to discover some statistics or mine more association rules. 
For instance, to query the reverse rule{Y ^ X), a function 
is firstly required to sort the rule body and the rule head 
alphabetically in order to calculate the support{Y ^ X), 
whereas 

count{Y U X) — count{X U Y) = root/ x/y /^counter. 
But, when calculating the confidence{Y =^ X), the rule 
body does not change in the denominator, i.e., support{Y), 
see equation (2). 

6. PERFORMANCE STUDY 

We have implemented the FXT construction algorithm us- 
ing some Java libraries for manipulating XML data structure 
(i.e., JDom, SAXPath, and Jaxen). Mining frequent item- 
sets and association rules are performed using the XQuery 
language. We experimented with different synthetic datasets, 
starting from 10 transactions to lOOK of transactions. The 
average lengths of transactions are 15 items per transaction. 
All experiments are performed on a 2.80 GHz PC with 3 GB 
RAM, running on Windows 7, with minimum Java heap size 
128 MB and maximum Java heap size 512 MB. 

We study the impact of constructing the FXT on the ma- 
chine resources. Figure [4][a) plots CPU time for new trans- 
action insertion given different FXT sizes. It can be easily 
observed that the CPU runs fast for inserting new trans- 
action even though FXT has large size (e.g., it takes 5ms 
to insert new transaction into a lOOK FXT size). Likewise, 
figure [4|b) plots memory usage, it can be observed that our 
algorithm consumes a small size of memory for new trans- 
action insertion with different FXT sizes. Figure |4|c) plots 
disk storage of the FXT document against different sizes of 
transactions. As shown in the figure, although the increas- 
ing relationship, the required storage remains small. Due 
to the FXT compact structure, the repeated or similar in- 
sertions of transaction need to only update item counters 
without consuming further storage space. 

Since we are interested in mining XML data using XML 
technologies, to the best of our knowledge there is only 
one most related work (i.e., implementation of Apriori al- 
gorithm using XQuery |9|). The Apriori algorithm always 
deals with static database of transactions. Figure |5] shows 
the performance comparison between our algorithm and the 
XQuery-based implementation of Apriori, for mining asso- 
ciation rules from XML using XQuery. It shows that our 
algorithm is always providing better performance than Apri- 
ori, specifically for larger amount of transactions (see figure 
[5|a)), and also for different values of minimum support (see 
figure [5|b)). Apriori generates frequent itemsets and asso- 
ciation rules each time from scratch, while our algorithm 
construct the FXT incrementally. Then frequent itemsets 
and association rules can be queried directly at any time 
from the FXT. Moreover, FXT is very compressed if com- 
pared with transactions document of Apriori algorithm. 

Finally, we conclude that our algorithm is very efficient to 




Figure 4: Machine resources for inserting new transaction to FXT 




consume resources. It can also mine frequent itemsets and 
association rules against different support and confidence 
values, without reconstructing its FXT from scratch that 
results in a better performance. Additionally, FXT perfor- 
mance is better than XQuery-based Apriori implementation. 

7. CONCLUSIONS 

In this paper, we propose an incremental approach for 
mining association rules from XML logged events. Our ap- 
proach applies an incrementing breath-then-depth algorithm, 
for constructing a novel frequency XML-based tree struc- 
ture. The algorithm composes of four steps for inserting 
transaction into the tree. The constructed tree can be di- 
rectly queried using XQuery language for retrieving frequent 
itemsets and association rules, without applying complex 
data mining techniques. Our algorithm handles incremen- 
tal logged events. Thus, it is featured with a single-pass of 
dataset, incremental processing of transaction, compressed 
structure of the tree, fast for inserting new transactions, fast 
for querying frequent itemsets or association rules, and effi- 
cient to limited resources. These features are validated by 
implementing the algorithm and experimenting its perfor- 
mance. 

In future, we aim at mining association rules from logged 
events taking into account their real-time of logging, and dis- 
covering the relationships among events against their logged 
real-time. Moreover, we intend to apply our algorithm for 
mining XML events that logged from our data integration 
platform [sj. This algorithm can be used to discover in- 
teresting knowledge, in order to maintain, automate, and 
re-activate the workflow behavior of the ETL tasks. 
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